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1. Introduction 

Let w € A u be an infinite word with values in a finite alphabet A. The (block) complexity function 
p w : N — > N assigns to each n the number of distinct factors of w of length n. A fundamental result 
due to Hedlund and Morse [?] states that a word w is ultimately periodic if and only if for some n the 
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complexity p w (n) < n. Sequences of complexity p{n) = n + 1 are called Sturmian words. The most 
studied Sturmian word is the so-called Fibonacci word 

01001010010010100101001001010010 . . . 

fixed by the morphism i — > 01 and 1 i-> 0. In [?] Hedlund and Morse showed that each Sturmian word 
may be realized geometrically by an irrational rotation on the circle. More precisely, every Sturmian 
word is obtained by coding the symbolic orbit of a point x on the circle (of circumference one) under a 
rotation by an irrational angle a where the circle is partitioned into two complementary intervals, one of 
length a and the other of length 1 — a. And conversely each such coding gives rise to a Sturmian word. 
The irrational a is called the slope of the Sturmian word. An alternative characterization using continued 
fractions was given by Rauzy in [?] and [?], and later by Arnoux and Rauzy in [?]. Sturmian words admit 
various other types of characterizations of geometric and combinatorial nature (see for instance [?]). For 
example they are characterized by the following balance property: A word w is Sturmian if and only if 
w is a binary aperiodic (non-ultimately periodic) word with the property that for any two factors u and 
v of w of equal length, we have — 1 < — Mi < 1 for each letter i. Here \u\i denotes the number of 
occurrences of i in u. In this paper, we establish some new characterizations of Sturmian words in terms 
of the lexicographic order behavior of its factors. We prove: 

Theorem 1.1. An infinite word w containing the letters and 1 is Sturmian if and only if for every pair 
of lexicographically consecutive factors v , v' of the same length, there exist A, {i such that v, v' either 
both belong to {AOl/i, A10/i} or both belong to {AO, Al}. 

Actually our first main result is later formulated in more general terms. The fact that this property holds 
for Sturmian words has recently been shown in [?], and is a direct consequence of a result proved in [?]. 
Our second characterization requires the additional hypothesis of recurrence: 

Theorem 1.2. Let robea recurrent aperiodic binary word over the alphabet {0, 1} and v, v' £ ¥&ct(w). 
Then the following are equivalent: 

1. w is Sturmian. 

2. For all factors v, v' of w of equal length, if v <\ cx v' then \v \\ < \v'\i. 

3. For any pair of lexicographically consecutive factors v, v' of the same length, v and v' differ in at 
most two positions. 

2. Preliminaries 

In this section, we introduce the tools which will be used in the rest of the paper. 
2.1. Standard notions in combinatorics on words 

We will report here the standard notations and notions in combinatorics on words that will be used in the 
rest of the paper. For further results on the subject we refer the reader to [?]. 
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By an alphabet we mean a finite non empty set A. The elements of A are called letters. We let 
A* denote the free monoid over A, i.e. the set of finite sequences of elements of A equipped with the 
concatenation product. The neutral element of A* will be called the empty word and is denoted e. The 
set of nonempty words over A, i.e. the free semigroup over A, is denoted A + . With the multiplicative 
notation, given a positive integer n and a word w, we let w n denote the concatenation of n copies of w. 
For each word w, we put w° = e. 

Two words v, v' are said to be conjugates one of the other if there exist A, p such that v = Xp and 
v' = pX. 

If a nonempty word x is such that x = x\X2 ■ ■ ■ Xk, with x.- L € A for 1 < i < k, then k is called the 
length of x and is denoted \x\. The length of the empty word is taken to be 0. 

We say that a word v is a. factor of another word w if there exist two words A, p such that w = Xvp. 
If A = e (resp. p = e) we call v a prefix (resp. a suffix) of w. If v is both a prefix and a suffix of w, we 
say that v is a border. A factor v of w is called proper if \v\ < \w\. We denote with Fact(ty) the set of 
all factors of the word w. A word w is said to be unbordered if the only borders of w are w and e. 

Most of the above definitions can be extended to the set A u of infinite words on the alphabet A. For 
w,w' e A u , we say w' is a tail of w if w = vw' for some v € A*. If v is not empty, we call w' a proper 
tail of if . 

We call an occurrence of v in u; a word A such that Xv is a prefix of w. An infinite word w is said to 
be recurrent if each of its factors (or, equivalently, of its prefixes) has infinitely many occurrences in w. 
Given v , w <G A* we let \w\ v denote the number of occurrences of v in w and set 

Alpfi(u>) = {x e A | \w\ x > 0}. 

A factor v of is unioccurrent if |u;|„ = 1, i.e., if v occurs in w exactly once. 

We say that an infinite word w is periodic if it can be expressed as an infinite concatenation of a 
finite word v, i.e. w = v u . We say that an infinite word is ultimately periodic if it has a periodic tail. 
Otherwise we say w is aperiodic. It is easy to show that any infinite word that contains itself as a proper 
tail is periodic. 

2.2. Lexicographic order 

Let A be an alphabet equipped with a total order < . Then < extends naturally to a partial order on A*, 
denoted <i cx , in the following way: We write v <\ cx v' (and say v is lexicographically smaller than v', ) 
if \v\ = \v'\ and there exists a word A and two letters a < b such that Xa is a prefix of v and A6 is a prefix 
of v'. Two words v, v' are said to be lexicographically consecutive or adjacent if v <i cx v' and there is 
no word w such that v <\ cx w <\ cx v'. 

We say a factor v of a word w is maximal (resp. minimal) in w if there exists no factor v' such that 
v <icx v' (resp. v' <\ cx v), thus omitting the sentence "with respect to the lexicographic order". We will 
say that v is extremal in w if it is either minimal or maximal. 

Given two factors v, v' of a word w such that v <\ cx v', we will write v' = succ„,(f ) if there is no 
/ € Fact(ttj) such that v <\ cx f <\ cx v'. Notice that if v € Fact(ty) is non extremal, then there exist 
/i> ji ^ Fact (to) such that fi = succ w (w) and v = sucCu,^). 

Remark 2.1. It is easy to show that if v is a unioccurrent prefix of an infinite word w and v is extremal 
in w, then every prefix of w longer than v is unioccurrent, and extremal of the same kind. 
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We can extend the definition of lexicographic order to infinite words in a natural way, saying that the 
infinite word to is lexicographically smaller than w' if w has a prefix which is lexicographically smaller 
than a prefix (of the same length) of to'. The notion of extremality extends as well: we say that an infinite 
word w is minimal (resp. maximal) if it is lexicographically smaller (resp. larger) than all its tails. 

Remark 2.2. It is clear that if aw and to are both extremal infinite binary words (and a is a letter), then 
they are extremal of the same kind (i.e. they are both minimal or maximal). 

2.3. Sturmian words 

Let v and v' be factors of to with \v\ = \v'\. We say the pair is bcilcificed if — l^'lx 

I < 1 for 

each letter x G A. Otherwise the pair (v, v') is said to be imbalanced. A word to is called balanced if all 
pairs of factors of to of the same length are balanced. 

A binary word w is called Sturmian if to is aperiodic and balanced. As mentioned earlier, Sturmian 
words are also defined in terms of the block complexity function p w : N — >■ N which assigns to each n 
the number of distinct factors of to of length n: w is Sturmian if and only if p w (n) = n + 1 for each 
n > 0. 

For each Sturmian word to G {0, 1} W we set 

n w = {w' G {0, 1} U | Fact(t</) = Fact («;)}. 

Thus Q, w is the shift orbit closure or subshift generated by to. The proof of the following proposition is 
in[?]. 

Proposition 2.3. Let to be a Sturmian word over the alphabet {0, 1}. Then there exists a unique word 7 
in Q w such that both O7 and I7 are in Q w . 

Remark 2.4. The word 7 in Proposition 12.3 1 is called the characteristic word of to and it is known that 
the prefixes of O7 are lexicographically minimal among the factors of to, while the prefixes of 1 7 are 
maximal. 

We say that a factor v of a Sturmian word to is a Christoffel word if v is unbordered. We group into 
the next statement the well-known properties of Christoffel words that we will need in the rest of the 
paper (see for instance [?, ?], [?, Prop. 5], [?, Prop. 6]). 

Proposition 2.5. Let to be a Sturmian word over the alphabet {0, 1} and let v G Fact(to) be a Christoffel 
word such that \v\ > 1. Then there exists u such that: 

1. v is either Qui or lttO, and they are both Christoffel words in w; 

2. Oul and luO are the only Christoffel words of length \v\ in to and are conjugates; 

3. all conjugates of u are in Fact (to); 

4. exactly one between OttO and lnl is a factor of to and is extremal in to; 

5. the factors of to of length \v\ are either conjugates of v or of type xux. 
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Those factors of a Sturmian word having the same length as a Christoffel word, but not conjugate to 
a Christoffel word (i.e. the factors xux in the preceding proposition), are called singular words of the 
Sturmian word. 



We begin with the following key proposition: 

Proposition 3.1. Let w £ {0, 1} W be an imbalanced word. Then there exists a factor u £ Fact(tu) of 
minimal length such that OuO, lul are in Fact(u>). Furthermore, either lOuO and Olul are both factors 
of w or there exists a unique letter x such that xux is a prefix of w and occurs in w only finitely many 
times. In the latter case every prefix of w is extremal in w. 



Since w is not balanced, there exists an imbalanced pair (v, v') consisting of factors v and v' of w. It 
is well known (see [?]) that the imbalanced pair of minimal length is of the form (OuO, lul) for some 
factor u of w and is unique. If both WuO, Olul are factors of w we are done. So let us assume that there 
exists a letter x £ {0, 1} such that no occurrence of xux in w is preceded by 1 — x. Then every internal 
(non-prefix) occurrence of xux in w is preceded by x. We begin by showing that xux is a prefix of w 
from which it follows that x is unique. Without loss of generality we can assume that x = 0. Suppose 
that the first occurrence of OuO in w occurs in position n > 0. If n = we are done. So suppose 
n > 0. Then OOu is a factor of w occurring in position n — 1 and the pair (OOu, lul) is imbalanced. 
By uniqueness of the shortest imbalanced pair we have that OOu = OuO and hence OuO also occurs in 
position n — 1, a contradiction on the minimality of n. This also shows that if OuO occurs in position t 
then it also occurs in each position r for < r < t. Thus OuO occurs only finitely many times in w (for 
otherwise w would be W and thus not binary). 

We next show that every prefix of w is minimal (if we had taken x = 1 then each prefix of w would 
be maximal). We proceed by contradiction. Let n > be the least positive integer for which there exists 
a factor v' of w in position n which is lexicographically smaller than the corresponding prefix v of w of 
the same length. Then either there exists a proper prefix u' of u such that On'l is a prefix of v and Ou'O 
is a prefix of v', or v' begins in OuO. In the first case Ou'O and the prefix ln'l of lul constitute a shorter 
imbalanced pair contradicting the minimality of |u|. In the second case v' is an internal occurrence of 
OuO and is hence preceded by 0. Thus the factor v" in position n — 1 of length \v'\ is lexicographically 
smaller than v' and also smaller than v, contradicting the minimality of n. □ 

The next proposition introduces the main subject of this paper: 

Proposition 3.2. Fix k > 1. Let A = {0, 1, . . . , k} be an ordered alphabet such that < 1 < • • • < k 
and w an infinite word such that Alph(tu) = A. The following are equivalent: 

1. For every v, v' £ Fact(w) with v' = succ w (t> ), there exist distinct letters a < b in A and A, u £ A* 



3. Main Result 



Proof: 



such that 




OR 




Xa 



Xb 
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2. For every v, v' G Fact(iu) with v' = succ w {v), there exist m G A and A, \i G A* such that 

J v = Am(m + \ v = Am 

I n' = A(m + l)mfL I d' = A(m + 1) 

3. A = {0, 1} and for every v, v' G Fact(iu) with v' = succ w (v), there exist A, [i G A* such that 

[v= A01/i QR f v = AO 
| u' = A10^ [ v' = Al 

Proof: 

Clearly (3) => (2) => (1). To see that (1) (3) it suffices to show that (1) implies that k = 1. We first 
note that if ab € Fact(w), then b G {a — 1, a, a + 1}. In fact suppose that a ^ b. Then either a < b or 
b < a. We consider the first case as the latter case is essentially identical. Let x, y G A such that ax is 
the greatest factor of length 2 beginning with a and (a + l)y be the smallest factor of length 2 beginning 
with (a + 1). Clearly (a + l)y = succ u ,(ax), which, from the hypothesis implies that x = a + 1 and 
y = a. Thus a6 is lexicographically smaller or equal to a(a + 1) from which it follows that b = a + 1. 

Now suppose to the contrary that k > 1, and consider the shortest factor i; of w containing both 
and 2. Then, from what we just proved, v = 01 n 2 or v = 21™0 for some n > 0. We will show that 
neither occurs in w. Suppose to the contrary that the first is a factor of w and consider the least n > for 
which 01 n 2 is a factor of w. Then as 01 n 2 is not maximal, its successor is either of the form 101 n_1 2 
or 01 n_1 21 or 01 n x for some 2 < x. The first two cases contradict the minimality of n while the last 
case implies that Ix is a factor of w for some 2 < x, again a contradiction. Similarly it is verified that 
v = 21™0 is never a factor of w. Hence k = 1. □ 

Definition 3.3. We say that an infinite word w has the "Nice Factors Ordering property" (NFOp) if for 
w one of the equivalent conditions of Proposition !3.2l holds. 

Remark 3.4. It is useful to stress that having the NFOp implies that the word w is actually binary. 
Also it is easy to see that NFOp actually characterizes the pairs of adjacent factors with respect to the 
lexicographic ordering, i.e., If w satisfies NFOp and v and v' are factors of w with v = A01// and 
v' = XlOfi or v = AO and v' = Al, then v' = succ w (t> ). 

Lemma 3.5. If an infinite word w has the NFOp, then w is aperiodic. 
Proof: 

Let as assume by contradiction that there exist w' , v G A* with w = w'v^ . Then w has finitely many 
tails and it is readily proved that they must respect the NFOp, i.e. if x and y are two lexicographically 
consecutive tails of w then we can write 

x = zOlz' y = zWz' . 

In particular, this implies that every tail contains either 01 or 10, hence v cannot be a single letter. As 
v w contains both 01 and 10, contains tails of the form (01t/) w and ft = (lOt/')" for some v', t/'with 
\v'\ = \v"\ = \v\ — 2. Clearly these two tails differ in an infinite number of positions. On the other hand 
w has only a finite number of tails and by assumption any two lexicographically consecutive tails differ 
in exactly two positions. Hence we obtain a contradiction. □ 
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Lemma 3.6. Let w be an infinite word with the NFOp. Then there exists no factor u in Fact (to) such 
that lOuO and Olul are both factors of w. 

Proof: 

Suppose to the contrary that there exists a shortest factor u such that both lOuO and Olul are factors 
of w. Since Olul <i ex WuO, but the two factors cannot be consecutive as they differ in at least three 
positions, the successor v of Olul satisfies Olul <i cx v <\ cx WuO. It follows that there exists a proper 
prefix A of u such that 01A0 is a prefix of Olul and 01A1 is a prefix of v (notice that v cannot begin with 
1 since otherwise it would be lOul and thus would be lexicographically larger than WuO). Since 10A0 
is a prefix of 1 Oul the factors 01 Al and 10 AO contradict the minimality of \u\. □ 

The following result is a direct consequence of a result proved by the third author together with 
Jenkinson in [?] and, more recently, has appeared in [?]; we include it here with a different proof, for the 
sake of completeness. 

Proposition 3.7. Let w be a Sturmian word on the alphabet {0, 1}. Then u satisfies NFOp. 
Proof: 

Let Oul be a Christoffel factor of w. As luO is a conjugate of Oul it follows that each factoring u = xy 
determines two conjugates of Oul, namely v = yOlx and v' = yWx. By Proposition 12.51 v and v' are 
factors of w; let z = succ w (w). As v <i ex v', the longest common prefix of v and z is at least y. In fact 
it cannot be longer, otherwise we could write v = yOlx'OA and z = u01x'l/i for some words x' , A, u; as 
if = ylOx'OA, we would have Ox'O, lx'1 G Fact (to), a contradiction since w is balanced. 

Similarly, y is also the longest common prefix between v' and the word z' such that succ w (z') = v'. 
It follows z = v' and v = z', i.e., v and v' are lexicographically consecutive. Thus any two consecutive 
conjugates of a Christoffel word in w satisfy the first condition in (3) of Proposition 13.21 More generally, 
if z and z' are lexicographically consecutive factors of w, then there exists a Christoffel factor Oul and 
two consecutive conjugates v and v' of Oul with z a prefix of v and z' a prefix of v'. The result now 
follows. □ 

Before proceeding to prove our main result, we need the following: 

Lemma 3.8. Let w G {0, \} w be a Sturmian word and x G {0, 1}. If xw satisfies NFOp then xw is 
Sturmian. 

Proof: 

We proceed by contradiction by supposing that uo is Sturmian, xw satisfies NFOp and that xw is not 
Sturmian. Without loss of generality we can assume that x = 0. It follows that there exists u such that 
both OuO and lul are factors of Ou;. Since w is Sturmian, it follows from Proposition 13. 1 1 that OuO is a 
unioccurrent prefix of Ou; and every prefix of Ou is minimal in Ou. On the other hand, if 7 denotes the 
characteristic word of w (which has u as a prefix), then every prefix of O7 is minimal in u. By NFOp 
it follows that for all n > \u\ + 2 the prefixes of length n of Ou; and O7 can be written respectively as 
0u01u n and 0ul0?; n for some word v n . Hence there exists a tail v of u such that Ou = OuOlv and 
O7 = OulOu Thus Ov, IdgO w , so that v — 7 and hence 7 is a proper tail of itself, a contradiction. □ 
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Theorem 3.9. Let w be an infinite word on the ordered alphabet A = {0, 1, ... , k}. The following 
statements are equivalent: 

1. w is Sturmian over the alphabet {0, 1}. 

2. w satisfies NFOp. 
Proof: 

That (1) =>• (2) follows from Proposition [377] To see that (2) => (1), we suppose that w € {0, 1} W 
satisfies NFOp and write w = aw' with a £ {0, 1}. We need to show that w is Sturmian. By Lemma 1331 
w is aperiodic. If w is not Sturmian, then by Lemma 13.81 we deduce that w' is not Sturmian. Also, 
combining Proposition 13.11 and Lemma 13.61 we deduce that every prefix of w is extremal and hence w' 
also satisfies NFOp. In short, if w = aw' satisfies NFOp and is not Sturmian, then every prefix of w is 
extremal and the tail w' satisfies NFOp and is not Sturmian. Thus writing w' = bw" we deduce that every 
prefix of w' is extremal and w" satisfies NFOp and is not Sturmian. Iterating this process indefinitely we 
deduce that for each tail v of w, each prefix of v is extremal in v. Since w is aperiodic it follows that 
there exists a tail v of w which begins in 01 and a tail v' of v which begins in 10. Since every prefix of v 
is minimal in v and every prefix of v' is maximal in v' it follows that 00 is not a factor of v and 11 is not 
a factor of v'. Hence v' = (10) w , a contradiction. □ 

We next establish another characterization of Sturmian words based on the lexicographic order of 
their factors. 

Theorem 3.10. Let it; be a recurrent aperiodic binary word over the alphabet {0, 1}. Then the following 
are equivalent: 

1 . w is Sturmian. 

2. For all factors and v, v' G Fact (til) of equal length, if v' = succu,(u) then v and v' differ in at most 
two positions. 

3. For all factors and v, v' £ Fact(w) of equal length, if v <i cx v' then \v\\ < \v'\i. 
Proof: 

([!]) ([2]) : Since w is Sturmian, it has the NFOp by Theorem 13.91 The statement is clearly proven since 
the NFOp trivially implies condition (ff) by definition. 

© =/- ([3]): Notice that condition © implies that if /' = succ w (f), then there must exist A 
with \x\ = \x'\ < 1 such that / = XOfixfi' and /' = Xlfix'fi'. Hence 

|/|i = |A|i + Hi + + \x\ x < |A|i + |m|i + Im'Ii + 1 < I A|i + |m|i + Im'Ii + W\i + 1 = l/'li- 

And thus, in particular < \f\i- Suppose v <i cx v'. Then there must exist Vo,... ,Vk such that 

v = Vq, v' = Vk and for all 1 < n < k, v n = succ m (u n _i), then 

\v\i = \v \i < ■ ■ ■ < \v k \i = \v'\t. 

© Assume w is not Sturmian; as w is aperiodic, it has to be imbalanced. Since w is 

recurrent, we have from Proposition 13 . 1 I that there must exist u such that both 10ix0 and Olul are factors 
of w. But clearly this is a contradiction, since Olul <i cx 10u0 and |01itl|i = |10u0|i + 1. This 
concludes the proof. □ 
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Notice that, as opposed to Theorem [3791 the above result actually needs the recurrence and aperiod- 
icity hypotheses, as for example: 

• the recurrent periodic word (01) w and the non-recurrent aperiodic word 00/ (where / is the Fi- 
bonacci word) both respect condition ©, although neither is Sturmian, 

• the non-recurrent ultimately periodic word 01^ satisfies both (0 and (O, but it is not Sturmian. 
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complexity p w (n) < n. Sequences of complexity p(n) = n + 1 are called Sturmian words. The most 
studied Sturmian word is the so-called Fibonacci word 

01001010010010100101001001010010 . . . 

fixed by the morphism i — > 01 and 1 i-> 0. In J9| Hedlund and Morse showed that each Sturmian word 
may be realized geometrically by an irrational rotation on the circle. More precisely, every Sturmian 
word is obtained by coding the symbolic orbit of a point x on the circle (of circumference one) under a 
rotation by an irrational angle a where the circle is partitioned into two complementary intervals, one of 
length a and the other of length 1 — a. And conversely each such coding gives rise to a Sturmian word. 
The irrational a is called the slope of the Sturmian word. An alternative characterization using continued 
fractions was given by Rauzy in [11] and 02L and later by Arnoux and Rauzy in HI. Sturmian words 
admit various other types of characterizations of geometric and combinatorial nature (see for instance 
Il3l0. For example they are characterized by the following balance property: A word w is Sturmian if and 
only if w is a binary aperiodic (non-ultimately periodic) word with the property that for any two factors u 
and v of w of equal length, we have — 1 < \u\i — \v\i < 1 for each letter i. Here \u\i denotes the number 
of occurrences of i in u. In this paper, we establish some new characterizations of Sturmian words in 
terms of the lexicographic order behavior of its factors. We prove: 

Theorem 1.1. An infinite word w containing the letters and 1 is Sturmian if and only if for every pair 
of lexicographically consecutive factors v, v' of the same length, there exist A, fi such that v, v' either 
both belong to {A01/i, A10/x} or both belong to {AO, Al}. 

Actually our first main result is later formulated in more general terms. The fact that this property holds 
for Sturmian words has recently been shown in iTTOl . and is a direct consequence of a result proved in 
0. 

Our second characterization requires the additional hypothesis of recurrence: 

Theorem 1.2. Let w be a recurrent aperiodic binary word over the alphabet {0, 1} and v, v' G Fact(u>). 
Then the following are equivalent: 

1. tt; is Sturmian. 

2. For all factors v, v' of w of equal length, if v <\ cx v' then \v\\ < \v'\±. 

3. For any pair of lexicographically consecutive factors v, v' of the same length, v and v' differ in at 
most two positions. 

2. Preliminaries 

In this section, we introduce the tools which will be used in the rest of the paper. 
2.1. Standard notions in combinatorics on words 

We will report here the standard notations and notions in combinatorics on words that will be used in the 
rest of the paper. For further results on the subject we refer the reader to 0. 
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By an alphabet we mean a finite non empty set A. The elements of A are called letters. We let 
A* denote the free monoid over A, i.e. the set of finite sequences of elements of A equipped with the 
concatenation product. The neutral element of A* will be called the empty word and is denoted e. The 
set of nonempty words over A, i.e. the free semigroup over A, is denoted A + . With the multiplicative 
notation, given a positive integer n and a word w, we let w n denote the concatenation of n copies of w. 
For each word w, we put w° = e. 

Two words v, v' are said to be conjugates one of the other if there exist A, p such that v = Xp and 
v' = pX. 

If a nonempty word x is such that x = x\X2 ■ ■ ■ Xk, with x.- L € A for 1 < i < k, then k is called the 
length of x and is denoted \x\. The length of the empty word is taken to be 0. 

We say that a word v is a. factor of another word w if there exist two words A, p such that w = Xvp. 
If A = e (resp. p = e) we call v a prefix (resp. a suffix) of w. If v is both a prefix and a suffix of w, we 
say that v is a border. A factor v of w is called proper if \v\ < \w\. We denote with Fact(ty) the set of 
all factors of the word w. A word w is said to be unbordered if the only borders of w are w and e. 

Most of the above definitions can be extended to the set A u of infinite words on the alphabet A. For 
w,w' e A u , we say w' is a tail of w if w = vw' for some v € A*. If v is not empty, we call w' a proper 
tail of if . 

We call an occurrence of v in u; a word A such that Xv is a prefix of w. An infinite word w is said to 
be recurrent if each of its factors (or, equivalently, of its prefixes) has infinitely many occurrences in w. 
Given v , w <G A* we let \w\ v denote the number of occurrences of v in w and set 

Alpfi(u>) = {x e A | \w\ x > 0}. 

A factor v of is unioccurrent if |u;|„ = 1, i.e., if v occurs in w exactly once. 

We say that an infinite word w is periodic if it can be expressed as an infinite concatenation of a 
finite word v, i.e. w = v u . We say that an infinite word is ultimately periodic if it has a periodic tail. 
Otherwise we say w is aperiodic. It is easy to show that any infinite word that contains itself as a proper 
tail is periodic. 

2.2. Lexicographic order 

Let A be an alphabet equipped with a total order < . Then < extends naturally to a partial order on A*, 
denoted <i cx , in the following way: We write v <\ cx v' (and say v is lexicographically smaller than v', ) 
if \v\ = \v'\ and there exists a word A and two letters a < b such that Xa is a prefix of v and A6 is a prefix 
of v'. Two words v, v' are said to be lexicographically consecutive or adjacent if v <i cx v' and there is 
no word w such that v <\ cx w <\ cx v'. 

We say a factor v of a word w is maximal (resp. minimal) in w if there exists no factor v' such that 
v <icx v' (resp. v' <\ cx v), thus omitting the sentence "with respect to the lexicographic order". We will 
say that v is extremal in w if it is either minimal or maximal. 

Given two factors v, v' of a word w such that v <\ cx v', we will write v' = succ„,(f ) if there is no 
/ € Fact(ttj) such that v <\ cx f <\ cx v'. Notice that if v € Fact(ty) is non extremal, then there exist 
/i> ji ^ Fact (to) such that fi = succ w (w) and v = sucCu,^). 

Remark 2.1. It is easy to show that if v is a unioccurrent prefix of an infinite word w and v is extremal 
in w, then every prefix of w longer than v is unioccurrent, and extremal of the same kind. 
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We can extend the definition of lexicographic order to infinite words in a natural way, saying that the 
infinite word to is lexicographically smaller than w' if w has a prefix which is lexicographically smaller 
than a prefix (of the same length) of to'. The notion of extremality extends as well: we say that an infinite 
word w is minimal (resp. maximal) if it is lexicographically smaller (resp. larger) than all its tails. 

Remark 2.2. It is clear that if aw and to are both extremal infinite binary words (and a is a letter), then 
they are extremal of the same kind (i.e. they are both minimal or maximal). 

2.3. Sturmian words 

Let v and v' be factors of to with \v\ = \v'\. We say the pair is bcilcificed if — l^'lx 

I < 1 for 

each letter x G A. Otherwise the pair (v, v') is said to be imbalanced. A word to is called balanced if all 
pairs of factors of to of the same length are balanced. 

A binary word w is called Sturmian if to is aperiodic and balanced. As mentioned earlier, Sturmian 
words are also defined in terms of the block complexity function p w : N — >■ N which assigns to each n 
the number of distinct factors of to of length n: w is Sturmian if and only if p w (n) = n + 1 for each 
n > 0. 

For each Sturmian word to G {0, 1} W we set 

n w = {w' G {0, 1} U | Fact(t</) = Fact («;)}. 

Thus Q, w is the shift orbit closure or subshift generated by to. The proof of the following proposition is 
inEQ. 

Proposition 2.3. Let to be a Sturmian word over the alphabet {0, 1}. Then there exists a unique word 7 
in Q w such that both O7 and I7 are in Q w . 

Remark 2.4. The word 7 in Proposition 12.3 1 is called the characteristic word of to and it is known that 
the prefixes of O7 are lexicographically minimal among the factors of to, while the prefixes of 1 7 are 
maximal. 

We say that a factor v of a Sturmian word to is a Christoffel word if v is unbordered. We group into 
the next statement the well-known properties of Christoffel words that we will need in the rest of the 
paper (see for instance EI3, Prop. 5], E Prop. 6]). 

Proposition 2.5. Let to be a Sturmian word over the alphabet {0, 1} and let v G Fact(to) be a Christoffel 
word such that \v \ > 1. Then there exists u such that: 

1. v is either Ottl or lttO, and they are both Christoffel words in w; 

2. Oul and luO are the only Christoffel words of length \v\ in to and are conjugates; 

3. all conjugates of v are in Fact(to); 

4. exactly one between OuO and lul is a factor of to and is extremal in to; 

5. the factors of to of length \v\ are either conjugates of v or of type xux. 
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Those factors of a Sturmian word having the same length as a Christoffel word, but not conjugate to 
a Christoffel word (i.e. the factors xux in the preceding proposition), are called singular words of the 
Sturmian word. 



We begin with the following key proposition: 

Proposition 3.1. Let w G {0, 1}^ be an imbalanced word. Then there exists a factor u G Fact(u;) of 
minimal length such that OuO, lul are in Fact(w). Furthermore, either lOuO and Olul are both factors 
of w or there exists a unique letter x such that xux is a prefix of w and occurs in w only finitely many 
times. In the latter case every prefix of w is extremal in w. 



Since w is not balanced, there exists an imbalanced pair (v,v') consisting of factors v and v' of w. It 
is well known (see [3]) that the imbalanced pair of minimal length is of the form (OuO, lul) for some 
factor u of w and is unique. If both lOuO, Olul are factors of w we are done. So let us assume that there 
exists a letter x G {0, 1} such that no occurrence of xux in w is preceded by 1 — x. Then every internal 
(non-prefix) occurrence of xux in w is preceded by x. We begin by showing that xux is a prefix of w 
from which it follows that x is unique. Without loss of generality we can assume that x = 0. Suppose 
that the first occurrence of OuO in w occurs in position n > 0. If n = we are done. So suppose 
n > 0. Then OOu is a factor of w occurring in position n — 1 and the pair (OOu, lul) is imbalanced. 
By uniqueness of the shortest imbalanced pair we have that OOu = OuO and hence OuO also occurs in 
position n — 1, a contradiction on the minimality of n. This also shows that if OuO occurs in position t 
then it also occurs in each position r for < r < t. Thus OuO occurs only finitely many times in w (for 
otherwise w would be 0^ and thus not binary). 

We next show that every prefix of w is minimal (if we had taken x = 1 then each prefix of w would 
be maximal). We proceed by contradiction. Let n > be the least positive integer for which there exists 
a factor v' of w in position n which is lexicographically smaller than the corresponding prefix v of w of 
the same length. Then either there exists a proper prefix u' of u such that Ou'l is a prefix of v and Ou'O 
is a prefix of v' , or v' begins in OuO. In the first case Ou'O and the prefix lu'l of lul constitute a shorter 
imbalanced pair contradicting the minimality of |u|. In the second case v' is an internal occurrence of 
OuO and is hence preceded by 0. Thus the factor v" in position n — 1 of length \v'\ is lexicographically 
smaller than v' and also smaller than v, contradicting the minimality of n. □ 

The next proposition introduces the main subject of this paper: 

Proposition 3.2. Fix k > 1. Let A = {0, 1, ...,&} be an ordered alphabet such that < 1 < • • • < k 
and w an infinite word such that Alph(u?) = A. The following are equivalent: 

1. For every v, v' G Fact(u?) with v' = succ UI (w), there exist distinct letters a < b in A and A, u G A* 



3. Main Result 



Proof: 



such that 




Xabfi 
Xbafj, 



OR 




Aa 



Xb 
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2. For every v, v' G Fact(iu) with v' = succ w {v), there exist m G A and A, \i G A* such that 

J v = Am(m + \ v = Am 

I n' = A(m + l)mfL I d' = A(m + 1) 

3. A = {0, 1} and for every v, v' G Fact(iu) with v' = succ w (v), there exist A, [i G A* such that 

[v= A01/i QR f v = AO 
| u' = A10^ [ v' = Al 

Proof: 

Clearly (3) => (2) => (1). To see that (1) (3) it suffices to show that (1) implies that k = 1. We first 
note that if ab € Fact(w), then b G {a — 1, a, a + 1}. In fact suppose that a ^ b. Then either a < b or 
b < a. We consider the first case as the latter case is essentially identical. Let x, y G A such that ax is 
the greatest factor of length 2 beginning with a and (a + l)y be the smallest factor of length 2 beginning 
with (a + 1). Clearly (a + l)y = succ u ,(ax), which, from the hypothesis implies that x = a + 1 and 
y = a. Thus a6 is lexicographically smaller or equal to a(a + 1) from which it follows that b = a + 1. 

Now suppose to the contrary that k > 1, and consider the shortest factor i; of w containing both 
and 2. Then, from what we just proved, v = 01 n 2 or v = 21™0 for some n > 0. We will show that 
neither occurs in w. Suppose to the contrary that the first is a factor of w and consider the least n > for 
which 01 n 2 is a factor of w. Then as 01 n 2 is not maximal, its successor is either of the form 101 n_1 2 
or 01 n_1 21 or 01 n x for some 2 < x. The first two cases contradict the minimality of n while the last 
case implies that Ix is a factor of w for some 2 < x, again a contradiction. Similarly it is verified that 
v = 21™0 is never a factor of w. Hence k = 1. □ 

Definition 3.3. We say that an infinite word w has the "Nice Factors Ordering property" (NFOp) if for 
w one of the equivalent conditions of Proposition !3.2l holds. 

Remark 3.4. It is useful to stress that having the NFOp implies that the word w is actually binary. 
Also it is easy to see that NFOp actually characterizes the pairs of adjacent factors with respect to the 
lexicographic ordering, i.e., If w satisfies NFOp and v and v' are factors of w with v = A01// and 
v' = XlOfi or v = AO and v' = Al, then v' = succ w (t> ). 

Lemma 3.5. If an infinite word w has the NFOp, then w is aperiodic. 
Proof: 

Let as assume by contradiction that there exist w' , v G A* with w = w'v^ . Then w has finitely many 
tails and it is readily proved that they must respect the NFOp, i.e. if x and y are two lexicographically 
consecutive tails of w then we can write 

x = zOlz' y = zWz' . 

In particular, this implies that every tail contains either 01 or 10, hence v cannot be a single letter. As 
v w contains both 01 and 10, contains tails of the form (01t/) w and ft = (lOt/')" for some v', t/'with 
\v'\ = \v"\ = \v\ — 2. Clearly these two tails differ in an infinite number of positions. On the other hand 
w has only a finite number of tails and by assumption any two lexicographically consecutive tails differ 
in exactly two positions. Hence we obtain a contradiction. □ 
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Lemma 3.6. Let w be an infinite word with the NFOp. Then there exists no factor u in Fact (to) such 
that lOuO and Olul are both factors of w. 

Proof: 

Suppose to the contrary that there exists a shortest factor u such that both lOuO and Olul are factors 
of w. Since Olul <i ex WuO, but the two factors cannot be consecutive as they differ in at least three 
positions, the successor v of Olul satisfies Olul <i cx v <\ cx WuO. It follows that there exists a proper 
prefix A of u such that 01A0 is a prefix of Olul and 01A1 is a prefix of v (notice that v cannot begin with 
1 since otherwise it would be lOul and thus would be lexicographically larger than WuO). Since 10A0 
is a prefix of 1 Oul the factors 01 Al and 10 AO contradict the minimality of \u\. □ 

The following result is a direct consequence of a result proved by the third author together with 
Jenkinson in @ and, more recently, has appeared in [ 10 1 ; we include it here with a different proof, for 
the sake of completeness. 

Proposition 3.7. Let w be a Sturmian word on the alphabet {0, 1}. Then u satisfies NFOp. 
Proof: 

Let Oul be a Christoffel factor of w. As luO is a conjugate of Oul it follows that each factoring u = xy 
determines two conjugates of Oul, namely v = yOlx and v' = yWx. By Proposition 12.51 v and v' are 
factors of w; let z = succ w (w). As v <i ex v', the longest common prefix of v and z is at least y. In fact 
it cannot be longer, otherwise we could write v = yOlx'OA and z = u01x'l/i for some words x' , A, u; as 
if = ylOx'OA, we would have Ox'O, lx'1 G Fact (to), a contradiction since w is balanced. 

Similarly, y is also the longest common prefix between v' and the word z' such that succ w (z') = v' . 
It follows z = v' and v = z', i.e., v and v' are lexicographically consecutive. Thus any two consecutive 
conjugates of a Christoffel word in w satisfy the first condition in (3) of Proposition 13.21 More generally, 
if z and z' are lexicographically consecutive factors of w, then there exists a Christoffel factor Oul and 
two consecutive conjugates v and v' of Oul with z a prefix of v and z' a prefix of v'. The result now 
follows. □ 

Before proceeding to prove our main result, we need the following: 

Lemma 3.8. Let w G {0, \} w be a Sturmian word and x G {0, 1}. If xw satisfies NFOp then xw is 
Sturmian. 

Proof: 

We proceed by contradiction by supposing that uo is Sturmian, xw satisfies NFOp and that xw is not 
Sturmian. Without loss of generality we can assume that x = 0. It follows that there exists u such that 
both OuO and lul are factors of Ou;. Since w is Sturmian, it follows from Proposition 13. 1 1 that OuO is a 
unioccurrent prefix of Ou; and every prefix of Ou is minimal in Ou. On the other hand, if 7 denotes the 
characteristic word of w (which has u as a prefix), then every prefix of O7 is minimal in u. By NFOp 
it follows that for all n > \u\ + 2 the prefixes of length n of Ou; and O7 can be written respectively as 
0u01u n and 0ul0?; n for some word v n . Hence there exists a tail v of u such that Ou = OuOlv and 
O7 = OulOu Thus Ov, IdgO w , so that v — 7 and hence 7 is a proper tail of itself, a contradiction. □ 
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Theorem 3.9. Let w be an infinite word on the ordered alphabet A = {0, 1, ... , k}. The following 
statements are equivalent: 

1. w is Sturmian over the alphabet {0, 1}. 

2. w satisfies NFOp. 
Proof: 

That (1) =>• (2) follows from Proposition [377] To see that (2) => (1), we suppose that w € {0, 1} W 
satisfies NFOp and write w = aw' with a £ {0, 1}. We need to show that w is Sturmian. By Lemma 1331 
w is aperiodic. If w is not Sturmian, then by Lemma 13.81 we deduce that w' is not Sturmian. Also, 
combining Proposition 13.11 and Lemma 13.61 we deduce that every prefix of w is extremal and hence w' 
also satisfies NFOp. In short, if w = aw' satisfies NFOp and is not Sturmian, then every prefix of w is 
extremal and the tail w' satisfies NFOp and is not Sturmian. Thus writing w' = bw" we deduce that every 
prefix of w' is extremal and w" satisfies NFOp and is not Sturmian. Iterating this process indefinitely we 
deduce that for each tail v of w, each prefix of v is extremal in v. Since w is aperiodic it follows that 
there exists a tail v of w which begins in 01 and a tail v' of v which begins in 10. Since every prefix of v 
is minimal in v and every prefix of v' is maximal in v' it follows that 00 is not a factor of v and 11 is not 
a factor of v'. Hence v' = (10) w , a contradiction. □ 

We next establish another characterization of Sturmian words based on the lexicographic order of 
their factors. 

Theorem 3.10. Let it; be a recurrent aperiodic binary word over the alphabet {0, 1}. Then the following 
are equivalent: 

1 . w is Sturmian. 

2. For all factors and v, v' G Fact (til) of equal length, if v' = succu,(u) then v and v' differ in at most 
two positions. 

3. For all factors and v, v' £ Fact(w) of equal length, if v <i cx v' then \v\\ < \v'\i. 
Proof: 

([!]) ([2]) : Since w is Sturmian, it has the NFOp by Theorem 13.91 The statement is clearly proven since 
the NFOp trivially implies condition (ff) by definition. 

© =/- ([3]): Notice that condition © implies that if /' = succ w (f), then there must exist A 
with \x\ = \x'\ < 1 such that / = XOfixfi' and /' = Xlfix'fi'. Hence 

|/|i = |A|i + Hi + + \x\ x < |A|i + |m|i + Im'Ii + 1 < I A|i + |m|i + Im'Ii + W\i + 1 = l/'li- 

And thus, in particular < \f\i- Suppose v <i cx v'. Then there must exist Vo,... ,Vk such that 

v = Vq, v' = Vk and for all 1 < n < k, v n = succ m (u n _i), then 

\v\i = \v \i < ■ ■ ■ < \v k \i = \v'\t. 

© Assume w is not Sturmian; as w is aperiodic, it has to be imbalanced. Since w is 

recurrent, we have from Proposition 13 . 1 I that there must exist u such that both 10ix0 and Olul are factors 
of w. But clearly this is a contradiction, since Olul <i cx 10u0 and |01itl|i = |10u0|i + 1. This 
concludes the proof. □ 
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Notice that, as opposed to Theorem [3791 the above result actually needs the recurrence and aperiod- 
icity hypotheses, as for example: 

• the recurrent periodic word (01) w and the non-recurrent aperiodic word 00/ (where / is the Fi- 
bonacci word) both respect condition ©, although neither is Sturmian, 

• the non-recurrent ultimately periodic word 01^ satisfies both (0 and (O, but it is not Sturmian. 
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